Overview

This document carries out all analysis for the Utah Aquatic Insect Decline project. Input files are:

  1. longterm_site_richness.csv: yearly average richness and abundance data for each site with at least 4 years of data (n = 142 sites)

  2. longterm_comid_richness.csv: yearly average richness and abundance data for each COMID segment with at least 4 years of data (n = 211 COMIDs)

  3. taxa_densities.csv: abundance and density info for each taxa in each sample (pre-richness calculations) - used for investigating specific taxa of interest.

Sections are:

  1. Initial data visualizations & basic analysis
  1. Taxa of interest
  1. Effects of sampling design (lab split, field split, area sampled) on total sample density

Read in datafiles:

site_inverts <- read.csv("cleaned_data/longterm_site_richness.csv")
comid_inverts <- read.csv("cleaned_data/longterm_comid_richness.csv")
taxa_densities <-read.csv("cleaned_data/taxa_densities.csv")%>%
  mutate(sampleDate =ymd(sampleDate))

I. Initial data visualizations & Basic LM’s

A. By Site:

i. Checking distributions:

Richness:

ggplot(site_inverts, aes(x = richness_xbar))+
  geom_histogram(fill = "Grey", color = "Black", bins = 70)+
  theme_classic()+
  labs(x = "Mean Richness", y = "Count")

Data are fairly normally distributed, maybe slightly bimodal.

Density:

ggplot(site_inverts, aes(x = density_xbar))+
  geom_histogram(fill = "Grey", color = "Black", bins = 70)+
  theme_classic()+
  labs(x = "Mean Total Density", y = "Count")

Strongly left-skewed - try a log transform:

ggplot(site_inverts, aes(x = log10(density_xbar)))+
  geom_histogram(fill = "Grey", color = "Black", bins = 70)+
  theme_classic()+
  labs(x = "log10(Mean Total Density)", y = "Count")

Better, still slightly skewed - will use log transform for all density plots for now.

B. By COMID:

i. Checking distributions

Richness:

ggplot(comid_inverts, aes(x = richness_xbar))+
  geom_histogram(fill = "Grey", color = "Black", bins = 70)+
  theme_classic()+
  labs(x = "Mean Richness", y = "Count")

Same as above, might be slightly more bimodal.

Density:

ggplot(comid_inverts, aes(x = density_xbar))+
  geom_histogram(fill = "Grey", color = "Black", bins = 70)+
  theme_classic()+
  labs(x = "Mean Total Density", y = "Count")

Still very left-skewed; use a log transform:

ggplot(comid_inverts, aes(x = log10(density_xbar)))+
  geom_histogram(fill = "Grey", color = "Black", bins = 70)+
  theme_classic()+
  labs(x = "log10(Mean Total Density)", y = "Count")

II. Checking taxa of interest

A. Pteronarcys californica

First, extract all observations of P. californica into a new dataframe:

p_cal <- taxa_densities%>%
  filter(scientificName == "Pteronarcys californica", 
         !splitCount == 0) #dropping samples where 0 individuals were counted

i. How is density changing over time across all sites?

Visualization:

ggplot(p_cal, aes(x = sampleDate, y = taxa_density))+
  geom_point(size = 0.5)+
  geom_smooth(size = 0.5)+
  geom_smooth(se = F, method = "lm", color = "Red", linetype = "dashed")+
  theme_classic()+
  labs(x = "Sample Date", y = "P. Californica Density (individuals/m2)")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Basic LM:

pcal.dens.lm <- lm(taxa_density~sampleDate, p_cal)
summary(pcal.dens.lm)
## 
## Call:
## lm(formula = taxa_density ~ sampleDate, data = p_cal)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -47.26 -28.68 -10.78  13.54 205.20 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 109.765792  49.878237   2.201   0.0329 *
## sampleDate   -0.004859   0.003462  -1.403   0.1674  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 45.62 on 45 degrees of freedom
## Multiple R-squared:  0.04193,    Adjusted R-squared:  0.02064 
## F-statistic: 1.969 on 1 and 45 DF,  p-value: 0.1674

Non-significant decrease in P. californica density over time (ignoring effects of site, etc.)

ii. Is prevalence (proportion of sites w/ P. calif. per year) changing over time?

Calculate # unique sites with P. calif. each year:

pCal_yrs <- p_cal%>%
  mutate(yr = year(sampleDate))%>% #create new column with year
  group_by(yr)%>% #group by year
  summarise(density_xbar = mean(taxa_density), #calculate mean density, # sites with P. Cal., and total abundance for each year
            n_sites = length(unique(siteId)), 
            tot_abund = sum(splitCount))

Add total sites each year and calculate % of sites w/ P. cali.:

# Calculate total sites per year:
sites <- taxa_densities%>%
  mutate(yr = year(sampleDate))%>% #Add year colum
  group_by(yr)%>% #group by year
  summarise(total_sites = length(unique(siteId))) #count # unique siteId's per year

#Merge with P cali data, calculate % sites with P. calif. each year: 

pCal_yrs <- merge(pCal_yrs, sites)%>%
  mutate(site_perc = (n_sites/total_sites)*100)

Visualization:

ggplot(pCal_yrs, aes(x = yr, y = site_perc))+
  geom_point()+
  geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
  geom_smooth(size = 0.5)+
  theme_classic()+
  labs(x = "Year", y = "% of sites with P. californica present")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Basic LM:

pcal.site.lm <- lm(site_perc~yr, pCal_yrs)
summary(pcal.site.lm)
## 
## Call:
## lm(formula = site_perc ~ yr, data = pCal_yrs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9849 -0.6676 -0.2487  0.5601  1.9133 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -39.88790   66.07993  -0.604    0.556
## yr            0.02061    0.03289   0.627    0.542
## 
## Residual standard error: 0.8528 on 13 degrees of freedom
## Multiple R-squared:  0.02933,    Adjusted R-squared:  -0.04533 
## F-statistic: 0.3928 on 1 and 13 DF,  p-value: 0.5417

No change in # sites with P. californica present over time.

iii. Is total abundance changing over time?

Visualization:

ggplot(pCal_yrs, aes(x = yr, y = tot_abund))+
  geom_point()+
  geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
  geom_smooth(size = 0.5)+
  theme_classic()+
  labs(x = "Year", y = "Total P. californica abundance")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

No obvious trends in total abundance over time.

iv. Is mean density changing over time?

ggplot(pCal_yrs, aes(x = yr, y = density_xbar))+
  geom_point()+
  geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
  geom_smooth(size = 0.5)+
  annotate(geom = "label", x = 2019, y = 80, label = "P = 0.035; R2 = 0.30")+
  theme_classic()+
  labs(x = "Year", y = "Average P. californica density (# individuals/m2)")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Basic LM:

pcal.dens.yr <- lm(density_xbar~yr, pCal_yrs)

summary(pcal.dens.yr)
## 
## Call:
## lm(formula = density_xbar ~ yr, data = pCal_yrs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.463 -16.434   8.193  12.534  43.442 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 4260.4680  1805.1872    2.36   0.0346 *
## yr            -2.1028     0.8985   -2.34   0.0359 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.3 on 13 degrees of freedom
## Multiple R-squared:  0.2965, Adjusted R-squared:  0.2423 
## F-statistic: 5.478 on 1 and 13 DF,  p-value: 0.03586

Significant decrease in mean density over time - When averaged across all sites with P. californica observations, density (# individuals/m2) has decreased between 2000 and 2023.

B. Zapada

First, extract all observations of Zapada into a new dataframe:

zapada <- taxa_densities%>%
  filter(sampleDate > "1990-01-01")%>% #dropping observations before 1990
  filter(genus == "Zapada", 
         !splitCount == 0) #dropping samples where 0 individuals were counted

Check Distribution:

ggplot(zapada, aes(x = log10(taxa_density)))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Strong left skew - use log transform.

i. How is density changing over time across all sites?

Visualization:

ggplot(zapada, aes(x = sampleDate, y = log10(taxa_density)))+
  geom_point(size = 0.5)+
  geom_smooth(size = 0.5)+
  geom_smooth(se = F, method = "lm", color = "Red", linetype = "dashed")+
  theme_classic()+
  labs(x = "Sample Date", y = "log10(Zapada Density (individuals/m2))")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'

Basic LM:

zap.dens.lm <- lm(log10(taxa_density)~sampleDate, zapada)
summary(zap.dens.lm)
## 
## Call:
## lm(formula = log10(taxa_density) ~ sampleDate, data = zapada)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.78419 -0.55566 -0.02905  0.54080  2.24495 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.686e+00  1.053e-01   25.50   <2e-16 ***
## sampleDate  -7.344e-05  7.247e-06  -10.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7452 on 1915 degrees of freedom
## Multiple R-squared:  0.05091,    Adjusted R-squared:  0.05041 
## F-statistic: 102.7 on 1 and 1915 DF,  p-value: < 2.2e-16

Significant decrease in P. californica density over time (ignoring effects of site, etc.)

ii. Is prevalence (% sites w/ Zapada per year) changing over time?

Calculate # unique sites with Zapada each year:

zapada_yrs <- zapada%>%
  mutate(yr = year(sampleDate))%>% #create new column with year
  group_by(yr)%>% #group by year
  summarise(density_xbar = mean(taxa_density), #calculate mean density, # sites with P. Cal., and total abundance for each year
            n_sites = length(unique(siteId)), 
            tot_abund = sum(splitCount))

Add total # sites per year and calculate %:

zapada_yrs <- merge(zapada_yrs, sites)%>%
  mutate(site_perc = (n_sites/total_sites)*100)

Visualization:

ggplot(zapada_yrs, aes(x = yr, y = site_perc))+
  geom_point()+
  geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
  geom_smooth(size = 0.5)+
  theme_classic()+
  labs(x = "Year", y = "Number of sites with Zapada present")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Basic LM:

zap.site.lm <- lm(site_perc~yr, zapada_yrs)
summary(zap.site.lm)
## 
## Call:
## lm(formula = site_perc ~ yr, data = zapada_yrs)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.7708  -9.1133   0.4998   5.1884  30.8810 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -221.2326   527.2285  -0.420    0.678
## yr             0.1236     0.2626   0.471    0.642
## 
## Residual standard error: 11.99 on 27 degrees of freedom
## Multiple R-squared:  0.008141,   Adjusted R-squared:  -0.02859 
## F-statistic: 0.2216 on 1 and 27 DF,  p-value: 0.6416

No change in % of sites with Zapada present over time.

iii. Is total abundance changing over time?

Visualization:

ggplot(zapada_yrs, aes(x = yr, y = tot_abund))+
  geom_point()+
  geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
  geom_smooth(size = 0.5)+
  theme_classic()+
  labs(x = "Year", y = "Total Zapada abundance")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Basic LM:

zap.abund.lm <- lm(tot_abund~yr, zapada_yrs)
summary(zap.abund.lm)
## 
## Call:
## lm(formula = tot_abund ~ yr, data = zapada_yrs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1270.2  -629.7  -479.6   515.6  3573.7 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -36828.23   49341.11  -0.746    0.462
## yr              18.90      24.57   0.769    0.448
## 
## Residual standard error: 1122 on 27 degrees of freedom
## Multiple R-squared:  0.02145,    Adjusted R-squared:  -0.0148 
## F-statistic: 0.5917 on 1 and 27 DF,  p-value: 0.4484

No change in Zapada abundance over time.

iv. Is mean density changing over time?

ggplot(zapada_yrs, aes(x = yr, y = density_xbar))+
  geom_point()+
  geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
  geom_smooth(size = 0.5)+
  annotate(geom = "label", x = 2019, y = 750, label = "P = 0.036, R2 = 0.15")+
  theme_classic()+
  labs(x = "Year", y = "Average Zapada density (# individuals/m2)")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Basic LM:

zap.dens.yr <- lm(density_xbar~yr, zapada_yrs)

summary(zap.dens.yr)
## 
## Call:
## lm(formula = density_xbar ~ yr, data = zapada_yrs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -305.25  -72.70  -10.54   17.09  580.20 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 16525.295   7405.867   2.231   0.0342 *
## yr             -8.133      3.688  -2.205   0.0361 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 168.4 on 27 degrees of freedom
## Multiple R-squared:  0.1526, Adjusted R-squared:  0.1212 
## F-statistic: 4.863 on 1 and 27 DF,  p-value: 0.03615

Significant decrease in mean density over time - When averaged across all sites with Zapada observations, density (# individuals/m2) has decreased between 2000 and 2023.

v. Zapada above 3000m:

First, select all sites with elevation >=3000m from zapada dataframe:

zapada_hi <- zapada %>% 
  filter(elev_m >=3000)

length(unique(zapada_hi$siteId))
## [1] 28

Only 3 sites above 3000m - probably can’t do much with that.

Alternative: identify 10 highest sites, extract data from only those sites:

zap_hiSites <- zapada %>%
  select(siteId, siteLongitude, siteLatitude,elev_m)%>% #select site and elevation cols
  distinct()%>% #remove duplicates
  arrange(desc(elev_m))%>% #sort by elevation, largest to smallest
  slice(1:10) #extract first 10 sites

zap_hi10 <- zapada %>%
  filter(siteId%in%zap_hiSites$siteId) #create new dataframe with Zapada data from 10 highest sites

Check distribution of density at these sites:

ggplot(zap_hi10, aes(x = log10(taxa_density)))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Strong left skew, use log transform.

Is density changing at these sites?

ggplot(zap_hi10, aes(x = sampleDate, y = log10(taxa_density)))+
  geom_point(size = 0.5)+
  #geom_smooth(size = 0.5)+
  geom_smooth(se = F, method = "lm", color = "Red", linetype = "dashed")+
  theme_classic()+
  labs(x = "Sample Date", y = "Zapada Density (individuals/m2)")
## `geom_smooth()` using formula 'y ~ x'

No clear trends.

Are there trends in prevalence, total abundance, or mean density over time?

Calculate number of sites w/ high elevation Zapada per year, mean density and total abundance per year:

zapHi_yrs <- zap_hi10%>%
  mutate(yr = year(sampleDate))%>% #create new column with year
  group_by(yr)%>% #group by year
  summarise(density_xbar = mean(taxa_density), #calculate mean density, # sites with P. Cal., and total abundance for each year
            n_sites = length(unique(siteId)), 
            tot_abund = sum(splitCount))

Add total sites each year and calculate % of sites w/ Zapada:

zapHi_yrs_pivot <- merge(zapHi_yrs, sites)%>% #merge with site numbers calculated above
  mutate(site_perc = (n_sites/total_sites)*100)%>% #calculate % sites w/ high elevation zapada
  pivot_longer(!yr, names_to = "measure", values_to = "value")%>% #pivot longer for plotting
  filter(!measure == "n_sites", !measure == "total_sites")

Visualization:

ggplot(zapHi_yrs_pivot, aes(x = yr, y = value, color = measure))+
  geom_point(size = 0.5)+
  geom_smooth(size = 0.5)+
  geom_smooth(se = F, method = "lm", linetype = "dashed")+
  theme_classic()+
  facet_wrap(~measure, scales = "free")+
  labs(x = "Sample Date", y = "")+
  theme(legend.position = "none")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

LM for density:

zapHi.dens.lm <- lm(density_xbar ~ yr, zapHi_yrs)
summary(zapHi.dens.lm)
## 
## Call:
## lm(formula = density_xbar ~ yr, data = zapHi_yrs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -548.87 -196.70  -33.94  173.37  938.63 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 86019.54   43592.81   1.973   0.0891 .
## yr            -42.67      21.70  -1.966   0.0900 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 465.5 on 7 degrees of freedom
## Multiple R-squared:  0.3558, Adjusted R-squared:  0.2637 
## F-statistic: 3.866 on 1 and 7 DF,  p-value: 0.09001

No Significant trends.

vi. Map of the 10 highest zapada sites

First, get map data for Utah:

utah <- map_data("state", region = "utah") #state polygon
ut.counties <- map_data("county", region = "utah") #county polygons

Make map of Utah with the 10 highest Zapada sites marked:

zap.map <- ggmap(get_map(c(-111.950684,39.419220), source = "stamen", zoom = 7, maptype = "terrain-background"))+ #basemap: terrain background (no labels) from google maps platform console, centered on UT centroid
  geom_polygon(data = ut.counties, aes(x = long, y = lat, group = group), color = "black", fill = "white", size = 0.5, alpha = 0.0, linetype = "dotted")+ #add counties to map, use dotted lines
  geom_polygon(data = utah, aes(x = long, y = lat, group = group), color = "black", fill = "white", alpha = 0.0)+ #add state outline to map, use solid line
  geom_point(data = zap_hiSites, aes(x = siteLongitude, y = siteLatitude, color = elev_m, group = NULL), size = 4, pch = 20)+ #add zapada sites, color code by elevation. 
  labs(color = "Site elevation (m)", title = "10 highest sites with Zapada")+ #updating legend title
  annotate(geom = "text", x = -111.8, y = 39, label = "Utah", size = 6, fontface = 4, color = "black", alpha = 0.75)+ #add "Utah" label to center of map
  scale_color_gradient(low = "#DCB0FC", high = "#7004BF")+ #setting color gradient for site colors
  theme_void()+ #removing lat/long from x and y axes 
  theme(legend.title = element_text(size = 10, face = "bold"), #adjusting legend title/text/plot title size, face, and position
        legend.text = element_text(size = 10), 
        legend.position = "right", 
        plot.title = element_text(size = 12, face = "bold", hjust = 0.5) 
        )
## ℹ <]8;;https://maps.googleapis.com/maps/api/staticmap?center=39.41922,-111.950684&zoom=7&size=640x640&scale=2&maptype=terrain&key=xxx-cc_ziVfJIReObeHLX39pYWWq8https://maps.googleapis.com/maps/api/staticmap?center=39.41922,-111.950684&zoom=7&size=640x640&scale=2&maptype=terrain&key=xxx-cc_ziVfJIReObeHLX39pYWWq8]8;;>
## ℹ Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
zap.map

Save this plot:

ggsave("plots/zapada_map.pdf", zap.map, device = "pdf", units = "in", width = 6.5, height = 4.5, dpi = "retina")

Try making a map color coded by site trend?

III. Effects of sampling design (lab split, field split, area sampled) on total sample density

First, need to calculate total density in each sample:

total_dens <- taxa_densities%>%
  group_by(sampleId)%>% # group by sample ID
  summarise(labSplit = mean(labSplit), #Add sample lab split, field split, and area
            fieldSplit = mean(fieldSplit), 
            tot_dens = sum(taxa_density), # calculate total density in sample
            area = mean(area))%>%
  mutate(sample_prop = labSplit*fieldSplit) #calculate sample proportion (labSplit*fieldSplit)

A. Density vs. LabSplit:

Basic plot:

dens.labsplit <- ggplot(filter(total_dens, !labSplit == 1), aes(x = labSplit, y = tot_dens))+
  geom_point(size = 0.5, alpha = 0.5)+
  geom_smooth(size = 0.5, color = "Red", linetype = "dashed")+
  theme_classic()+
  labs(y = "Total Sample Density", x = "")
dens.labsplit
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Log transformed plot:

dens.labsplit.log <- ggplot(total_dens, aes(x = labSplit, y = log10(tot_dens)))+
  geom_point(size = 0.5, alpha = 0.5)+
  geom_smooth(size = 0.5, color = "Red", linetype = "dashed", method = "lm")+
  annotate(geom = "label", x = 0.5, y = 0.2, label = "P <0.001; R2 = 0.55")+
  theme_classic()+
  labs(y = "log10(Total Sample Density)", x = "Lab Split Proportion")
dens.labsplit.log
## `geom_smooth()` using formula 'y ~ x'

Basic LM:

dens.ls.lm <- lm(log10(tot_dens)~labSplit, total_dens)
summary(dens.ls.lm)
## 
## Call:
## lm(formula = log10(tot_dens) ~ labSplit, data = total_dens)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.71463 -0.26410 -0.05217  0.29127  1.67527 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.17922    0.01133  368.97   <2e-16 ***
## labSplit    -1.47447    0.01710  -86.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5062 on 6112 degrees of freedom
## Multiple R-squared:  0.5489, Adjusted R-squared:  0.5488 
## F-statistic:  7436 on 1 and 6112 DF,  p-value: < 2.2e-16

Significant negative relationship between lab split and total sample density.

B. Density vs. Sample Proportion (lab split * field split):

Basic plot:

dens.sampleprop <- ggplot(total_dens, aes(x = sample_prop, y = tot_dens))+
  geom_point(size = 0.5, alpha = 0.5)+
  geom_smooth(size = 0.5, color = "Red", linetype = "dashed")+
  theme_classic()+
  labs(y = "", x = "")
dens.sampleprop
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Log transformed plot:

dens.sampleprop.log <- ggplot(total_dens, aes(x = sample_prop, y = log10(tot_dens)))+
  geom_point(size = 0.5, alpha = 0.5)+
  geom_smooth(size = 0.5, color = "Red", linetype = "dashed", method = "lm")+
  annotate(geom = "label", x = 0.5, y = 0.2, label = "P <0.001; R2 = 0.55")+
  theme_classic()+
  labs(y = "", x = "Sample Proportion")
dens.sampleprop.log
## `geom_smooth()` using formula 'y ~ x'

Basic LM:

dens.sp.lm <- lm(log10(tot_dens)~sample_prop, total_dens)
summary(dens.sp.lm)
## 
## Call:
## lm(formula = log10(tot_dens) ~ sample_prop, data = total_dens)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.71430 -0.26338 -0.05351  0.29128  1.67559 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.17803    0.01131  369.27   <2e-16 ***
## sample_prop -1.47361    0.01709  -86.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5062 on 6112 degrees of freedom
## Multiple R-squared:  0.5489, Adjusted R-squared:  0.5488 
## F-statistic:  7438 on 1 and 6112 DF,  p-value: < 2.2e-16

Significant negative relationship between sample proportion and total sample density.

C. Sample Area

Basic plot:

dens.area <- ggplot(filter(total_dens, area <50), aes(x = area, y = tot_dens))+
  geom_point(size = 0.5, alpha = 0.5)+
  geom_smooth(size = 0.5, color = "Red", linetype = "dashed")+
  theme_classic()+
  labs(y = "", x = "")
dens.area
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Log transformed plot:

dens.area.log <- ggplot(filter(total_dens, area <50), aes(x = area, y = log10(tot_dens)))+
  geom_point(size = 0.5, alpha = 0.5)+
  geom_smooth(size = 0.5, color = "Red", linetype = "dashed", method = "lm")+
  annotate(geom = "label", x = 1.75, y = 0.1, label = "P <0.001; R2 = 0.10")+
  theme_classic()+
  labs(y = "", x = "Sample area, m2")
dens.area.log
## `geom_smooth()` using formula 'y ~ x'

Basic LM:

dens.ar.lm <- lm(log10(tot_dens)~area, filter(total_dens, area <50))
summary(dens.ar.lm)
## 
## Call:
## lm(formula = log10(tot_dens) ~ area, data = filter(total_dens, 
##     area < 50))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1364 -0.3778  0.1454  0.5158  1.4501 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.73072    0.01646  226.61   <2e-16 ***
## area        -0.62620    0.02428  -25.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7157 on 6112 degrees of freedom
## Multiple R-squared:  0.09816,    Adjusted R-squared:  0.09802 
## F-statistic: 665.3 on 1 and 6112 DF,  p-value: < 2.2e-16

Significant negative relationship between sample proportion and total sample density.

D. Create summary figure of all three:

dens.samplemethod <- (dens.labsplit/dens.labsplit.log)|(dens.sampleprop/dens.sampleprop.log)|(dens.area/dens.area.log)
dens.samplemethod

Save:

ggsave("plots/dens_samplemethod.pdf", dens.samplemethod, device = "pdf", units = "in", height = 6.5, width = 9, dpi = "retina")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
Summary:

log10(Density) has a significant negative correlation with lab split, sample proportion (lab split * field split), and area. The R2 for lab split and sample proportion are fairly hight (0.55), while the R2 for area is very low (0.1).